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Abstract 

Bailey showed that the general pointwise forecasting for stationary 
and ergodic time series has a negative solution. However, it is known 
that for Markov chains the problem can be solved. Morvai showed 
that there is a stopping time sequence {A n } such that P(X\ n +i = 
1\Xq, . . . , X\ n ) can be estimated from samples (Xq, . . . , X\ n ) such that 
the difference between the conditional probability and the estimate 
vanishes along these stoppping times for all stationary and ergodic 
binary time series. We will show it is not possible to estimate the 
above conditional probability along a stopping time sequence for all 
stationary and ergodic binary time series in a pointwise sense such 
that if the time series turns out to be a Markov chain, the predictor 
will predict eventually for all n. 
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1 Introduction and Statement of Results 

Cover [2] posed the following fundamental problem concerning forecasting for 
stationary and ergodic binary time series {Xn}^,^. (Note that a stationary 
time series {X n }^ =0 can be extended to be a two sided stationary time series 

\Xn} n= -oo-) 

Problem 1 

Is there an estimation scheme f n for the value P(X n+ i = 1\Xq, X%, . . . , X n ) 
such that f n depends solely on the data segment (X , Xx, . . . , X n ) and 

lim \f n (X , X h ..., X n ) - P(X n+1 = 1\X , X h . . .,X n )\ = 

n— >oo 

almost surely for all stationary and ergodic binary time series {-Xn}£L_ 0Q ? 

This problem was answered by Bailey [1] in a negative way, that is, he showed 
that there is no such scheme. (Also see Ryabko [10], Gyorfi, Morvai, Yakowitz 
[5] and Weiss [II].) 

Morvai [8] considered the following modification of Problem 1. 
Problem 2 

Are there a strictly increasing sequence of stopping times {A n } and es- 
timators {h n (Xo, . . . ,X\ n )} such that for all stationary ergodic binary time 
series {X n } the estimator h n is consistent at stopping times X n , that is, 

lim \h n (X , . . . ,X A J - P(X Xn+1 = 1\X , . . . ,X Xn )\ = 
almost surely ? 

Morvai [8] constructed a scheme that solves Problem 2. Unfortunatelly, his 
stopping times grow extremly rapidly and so that scheme is not practical at 
all. 

Let X*~ be the set of all one-sided binary sequences, that is, 

X*~ = {(..., x-x, x ) : %i E {0, 1} for all -oo < i < 0}. 
Define the distance d*(-, •) on X*~ as follows. Let 

oo 

d*((. ..,x_x,x ),(. ..,y-x,y )) = 5^ 2_ ^ 1 l x - i -V-i\- 

i=0 
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Definition The conditional probability P(X 1 = 1| . . . ,X-i,X ) is almost 
surely continuous if to some set C C X*~ which has probability one the 
conditional probability P[X\ = 1| . . . , -X"_x, Xq) restricted to this set C is 
continuous with respect to metric d*(-, •). 

The processes with almost surely continuous conditional probability gen- 
eralizes the processes for which it is actually continuous, these are essentially 
the Random Markov Processes of Kalikow [6 J , or the continuous g-measures 
studied by Mike Keane [7]. 

A more moderate growth ( compared to Morvai [8] ) was achieved by 
Morvai and Weiss [9] but the consistency was secured only for the subclass 
of all stationary and ergodic binary time series with almost surely continuous 
conditional probability P{X\ = 1| . . . , X_i, X ). 

However for the class of all stationary and ergodic Markov-chains of some 
finite order Problem 1 can be solved. Indeed, if the time series is a Markov- 
chain of some finite order, we can estimate the order (e.g. as in Csiszar, 
Shields [3] and Csiszar [I]) and count frequencies of blocks with length equal 
to the order. Bailey showed that one can't test for being in the class. 

It is conceivable that one can improve the result of Morvai [8] or Morvai 
and Weiss [DJ so that if the process happens to be Markovian then one even- 
tually estimates at all times. Our purpose in this paper is to show that this 
is not possible. This puts some new restrictions on what can be achieved in 
estimating along stopping times. 

Theorem 1 For any strictly increasing sequence of stopping times {A n } 
such that for all stationary and ergodic binary Markov-chains with arbi- 
trary finite order, eventually X n+ i = A n + 1, and for any sequence of es- 
timators {h n (X , . . . , X Xn )} there is a stationary and ergodic binary time 
series {X n } with almost surely continuous conditional probability P{X\ = 
1| . . . , X-x, Xq), such that 

P Mim sup \h n (X , . . . ,X Xn ) - P(X Xn+1 = 1\X , . . .,X Xn )\ >o)>0. 

\ n~ >oo / 

Remark: Bailey [1] among other things proved that there is no sequence 
of functions {e n (X r l ~ 1 )} which for all stationary and ergodic time series, if 
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it turns out to be a Markov-chain, would be eventually 1 and otherwise. 
(That is, there is no test for the Markov property.) This result does not imply 
ours. On the other hand, our result implies Bailey's. (Indeed, if there were 
a test for Markov-chains in the above sense, we could apply the estimator in 
Morvai [8] or Morvai and Weiss [9] if the time series is not a Markov-chain of 
some finite order, and if the time series is a Markov-chain of some finite order 
we can estimate the order of the Markov chain (e.g. as in Csiszar, Shields 
[3] or Csiszar [I]) and count frequencies of blocks with length equal to the 
order. 

Bailey [1] and Ryabko [10] proved less than our Theorem [TJ They proved 
the nonexistence of the desired estimator when the estimator should work 
for all stationary and ergodic binary time series and when all A n = n, that 
is, when we always require good prediction. 

2 Proof of Theorem H 

Proof: 

The proof mainly follows the footsteps of Ryabko [10] and Gyorfi, Mor- 
vai, Yakowitz [S] with alterations where necessary. For m < n let X^ = 
(X m ,...,X n ). First we define the same Markov-chain as in Ryabko [TU] 
which serves as the technical tool for construction of our counterexample. 
Let the state space S be the non-negative integers. From state the pro- 
cess certainly passes to state 1 and then to state 2, at the following epoch. 
From each state s > 2, the Markov chain passes either to state or to state 
s + 1 with equal probabilities 0.5. This construction yields a stationary and 
ergodic Markov chain {Mi} with stationary distribution 

P(M = 0) = P{M = 1) = - 

and 

P(M = «') = — for i > 2. 

Let ipk denote the first positive time of occurrence of state 2k : 

ip k = min{i > : Mi = 2k}. 

Note that if M = then Mi < 2k for < i < tp k . For each < j < oo 
we will define a binary- valued Markov-chain {X^} with some finite order, 
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which we denote as Xf' = f^(Mi) where will be a {0, 1} valued function 
of the state space S. We will also define a process {Xi} which we denote 
as Xi = /(°°)(Mj) where J*- 00 -* is also a binary valued function of the state 
space S, and the time series {Xi} will serve as the stationary (non Markov 
) unpredictable process. For all < j < oo, let f ij) (0) = 0, f ij \l) = 0, 
and f^(s) = 1 for all even states s. Note that so far we have only defined 
partially. We will define the values for the remaining states later on. 
A feature of this definition of f^(-) is that whenever Xn^ = 0,X^i = 
0, X^l 2 = 1 we know that M n = and vice versa. 

Now observe that if for a certain < j < oo, there is an index Kj 
such that f*j\i) = 1 for all i > Kj then the defined process is a 

binary Markov-chain with order not greater than Kj. (Indeed, the prob- 
abilities P(xi j) = l\X { Q j) ,. . . ,X^1 X ) are determined by the last Kj bits 
{^■n-Kji ■■■■> -^n-i)- To see this consider the following cases. 

I. If for some 1 < i < Kj - 2 X^ = 1 and X^l^ = X { J1 2 _ { = than 
we can detect that M n _j = 2, M n _ X -% = 1 and M n _ 2 -i = and the 
conditional probability does not depend on previous values. 

II. If there is no 1 < i < K j - 2 such that X { J ] ■ = 1 and X { J\ , = 

J lb L It J. I 

X^}_2-i = we have three sub-cases. 

II/l. If = 1 then M n _! > Kj. In this case the conditional proba- 
bility is 0.5. 

11/2. If X® 2 = = then M n _ x = 1 and the conditional probabil- 
ity is 1. 

II/3. If X® 2 = 1 and X^ = then M n ^ = and so the conditional 
probability is 0.) 

Now let f(°\2k + 1) = 1 for all k > 1 and so the function /(°) is fully de- 
fined. Since f(°\i) is eventually 1, the defined process {X^ -*} is a stationary 
ergodic binary Markov-chain with some finite order. 

For function f^>) and index 2k, if f^\i) is defined for all < i < 2k, then it is 
easy to see that if M = (that is, f ij) (M ) = 0, f^{M 1 ) = 0, f ij) (M 2 ) = 1 
) then Mi < 2k for < % < ip k and the mapping 

Mt^U (j \M,),...,f^\M^)) 
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is invertible. If we let \ n operate on process {X> J) }, define 

Aj(k) = {M = 0, i) k = X n (X^ j \x[ j \ ...) for some n}. 

Thus as soon as f^\i) is defined for all < i < 2k the set Aj(k) is also well 
defined, it is measurable with respect to M$ k and depends on state 2k and 
index j which selects the process {X„^} on which the stopping times {A„} 
operate. 

Let ALi = 1. Notice that A (k) is well defined for all k. Now we define 
by induction. Assume that for < % < j — 1 we have already defined 
a strictly increasing sequence of integers iVj_i, and functions which are 
eventually constant. 

Now we define Since by assumption {Xn~^} is a stationary and er- 

godic binary-valued Markov process with some finite order, the estimator is 
assumed to predict eventually on this process and there is a Nj_i > Nj_ 2 
such that 

P(Vi(^-i)) > VS. 

Now for each j < I < oo define /^(2m+l) for the segment Nj- 2 < m < Nj-i 
as follows, 

f {l) (2m + 1) = f (j - 1) (2m+ 1). 

Notice that now Aj(Nj_ 1 ) is well defined and coincides with Aj_ 1 (A^_ 1 ). We 
will define f^(2Nj_ 1 + 1) maliciously. Let 

B+ = MNj-J f){h n (f ij) (M ), f U) m Nj J) > \} 

and 

BJ = A,{N^) f]{h n (f (j) (M ), f^(M^J) < \}. 

Now notice that the sets B~J and BJ do not depend on the future values of 
/0')(2r + l) for r > iV,-_i. One of the two sets Bj, BJ has at least probability 
1/16. Now we specify f^{2N j _ l + 1). Let f^{2N j _ l + 1) = 1, Ij = BJ if 
P(BJ) > P{BJ) and let / (j) (2A^_ 1 + 1) = 0, 7 j = 5+ if P(Bj) < P(Bj). 
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Because of the construction of {Mj}, on event Ij, 



r>( v(j) — 1 1 vU) vU) \ 
= /W(2JV i _ 1 + l)P(X^_ i+1 = /(2iV i _ 1 + l)|X w , . . . , X^_ r 
= /W(2JV i _ 1 + l)P(M^_ i+1 = 2AT._ X + llMo^- 1 ) 



0.5/^(2JV,_! + l). 



The difference of the estimate and the conditional probability is at least j 
on set Ij and this event occurs with probability not less than 1/16. 
Now for all iVj_i < m define 



f { j\2m + 1) = 1. 



In this way, {X^ } is also a stationary and ergodic binary- valued Markov- 
chain. 

Now by induction, we defined all the functions f^' for < j < oo. Since 
/(°°)(m) = / (j) (m) = fti-V( m ) for all < m < 2iV i _i so we also defined 
f(°°). 

Finally by Fatou's Lemma, 

P(hmsup{|MX A ") - ^(^a„ + i = 1|X A ")| > 1/4}) 

> P(limsupJj) > limsupP(Jj) > — . 

w/ - 16 

Concerning the conditional probability P(X% = llX^) observe that as soon 
as one finds the pattern '001' in the sequence the conditional probability 
does not depend on previous values. The probability of the occurence of '001' 
in the past is one since the original Markov chain is ergodic and our process 
is therefore also ergodic. Thus the conditional probabilities are almost surely 
continuous. The proof of Theorem [1] is complete. 
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